Some background notes

Wrangling and data cleaning

  • Merged metadata sources from all field collections
  • Various cleaning and whatnot
  • Spatial clustering of points to assign “populations”
    • For now, all analyses are done at a 100km spatial clustering. This is very crude, but easy to understand and work with. I think it’s fair enough and a good balance to allow distinct putative populations without too many assumptions or really thinning sample sizes per cluster (they are already thin in some places!). For site-level analyses, see the COLONY notes below

COLONY/pedigree analysis

  • Ran COLONY for each 100km cluster, separated by year (i.e. excluded “impossible” relationships, made COLONY run smooth that way!)
  • Wrangled COLONY outputs and merged it back to the genotype data
    • Although colonies are assigned at the 100km cluster, they can easily be examined at smaller spatial scales (e.g. filter dataset to only look at MN Zoo even though that’s clustered with Twin Cities generally. Then can examine # of unique colonies at the zoo, etc)
  • That is what I did for the downstream stuff. It works well.

Potential decisions that could be reconsidered

Leaving these here as notes, but I feel comfortable with the decisions I’ve made.


Preliminary Results

Data Availability and Map of collections

  • Total specimens in USDA database: 665
  • Total specimens with ANY genetic data in USDA database: 498
  • Specimens after basic filtering: 470
    • Had known lat-long, sex, and other basic metadata
    • Was from 2020 or 2021
    • Other minor filtering
  • Specimens used in COLONY: 308
    • Females only
    • Had at least 10 loci with data (vast majority have 13)
    • Was NOT from a known colony (this removes quite a few specimens)
  • Specimens in pop-gen analyses: 231
    • Same filters as COLONY and
    • Only one individual per detected colony
  • Number of males: 116

Figure 1. Map showing the collection locations of 470 specimens available after basic initial filtering. Colors represent “population cluster” as assigned by tree method (used hclust and cutree; tree length = 100km). Hover over the dots to see what cluster they were assigned to!

Quality checks and summary statistics

HW Equilibrium

See Shalene’s paper on page 998 for inspiration on how to deal with this…Jha et al. 2015 Contemporary human-altered landscapes and oceanic barriers reduce bumble bee gene flow

##               chi^2  df  Pr(chi^2 >) Pr.exact
## btern01    97.66759  36 1.343170e-07    0.000
## bt28       59.31202  10 4.889876e-09    0.000
## b96       105.11921  28 7.376899e-11    0.000
## bt30      113.92536  66 2.288810e-04    0.000
## btms0081   13.89628   3 3.049781e-03    0.003
## btms0066  421.14552 276 3.999226e-08    0.000
## btms0083 1010.34228 465 0.000000e+00    0.000
## b126      313.28115 120 0.000000e+00    0.000
## btms0062 1146.60599 561 0.000000e+00    0.000
## btern02   954.22326 300 0.000000e+00    0.000
## btms0086   50.77823   3 5.454315e-11    0.000
## bl13       58.50354   3 1.227130e-12    0.000
## btms0059  112.09174  36 9.687809e-10    0.001

There is evidence that, globally, loci are out of HWE. But this is due to differences between subpopulations. Shalene handles this nicely in the paper noted above, so it’s something to report…but not really anything to worry about downstream.

Linkage Disequilibrium

LD between loci - not likely an issue here (p.rD and p.Ia > 0.05; no consistent linkage in the pairwise comparison either).

##          Ia        p.Ia       rbarD        p.rD 
## 0.054012341 0.053000000 0.004580062 0.053000000

Null alleles

Null allele frequences are within the range typical for pop gen analysis with microsatellites and unlikely to be an issue. Can cite https://www.nature.com/articles/6800545 and write it like they do in https://peerj.com/articles/13565/ where they say, “Nevertheless, the frequency of null alleles inferred with the methods of Brookfield and Chakraborty (0.09 and 0.13, respectively) is in line with values commonly reported in the literature and is unlikely to cause a major bias in downstream population structure analyses (Dakin & Avise, 2004).”

##                       btern01       bt28        b96       bt30   btms0081
## Observed frequency 0.07366541 0.07133358 0.09233988 0.09316057 0.06801660
## Median frequency   0.07226019 0.07150322 0.09097403 0.09317786 0.06711364
## 2.5th percentile   0.03601541 0.02964520 0.05142277 0.05409726 0.02201699
## 97.5th percentile  0.11281467 0.11534776 0.13633140 0.13493682 0.11317655
##                      btms0066   btms0083       b126   btms0062   btern02
## Observed frequency 0.06941596 0.13619032 0.09291617 0.12542148 0.1480633
## Median frequency   0.06895868 0.13399140 0.09147973 0.12406915 0.1455649
## 2.5th percentile   0.03505703 0.09892215 0.05678336 0.08843347 0.1080696
## 97.5th percentile  0.10267418 0.17449799 0.13111724 0.16409324 0.1863372
##                      btms0086       bl13   btms0059
## Observed frequency 0.05457863 0.08153726 0.06513655
## Median frequency   0.05238583 0.07891641 0.06337953
## 2.5th percentile   0.01546550 0.03615517 0.02414679
## 97.5th percentile  0.09448964 0.12691457 0.10382729

Missing loci

There are no loci with <80% completeness

## named numeric(0)

Individuals with poor data

There are no individuals with poor data in the dataset…but that is because I also excluded them earlier in the process. So this is just a sanity check.

## named numeric(0)

Duplicates or clones?

There are no duplicates or clones in the dataset. It’s possible that there are prior to the COLONY filtering, so if re-running analysis keeping siblings in the dataset, double check.

## #############################
## # Number of Individuals:  231 
## # Number of MLG:  231 
## #############################
## [1] 231

Polymorphic

Just a little sanity check that all the loci are polymorphic (they are)

##    Mode    TRUE 
## logical      13

Population Genetics Analysis

Summary Stats and Basic Genetic Diversity Measurements

Summary Statistics

Table of various summary stats

From the summary call to the genind object, we find the number of specimens per cluster, the number of private alleles (which is not very informative as it’s correlated with sample size), and the rarefied allelic richness (which may be informative). It’s quite debatable how well the rarefied richness does when dealing with such uneven population sizes. This is a case where re-running with different clusters might be more effective (i.e. splitting apart the big ones and then dropping the small ones for this analysis)

Observed v. Expected Heterozygosity

Only displaying populations with 5 or more specimens.

Figure 2. Observed (blue) versus expected (grey) heterozygosity across the regions. Numbers above bars are the sample size for each region.

Same (except ALL clusters, not just those with >=5 specimens) as figure but in table format for investigating to your heart’s content. Added some math for a little helper.

SAME AS BELOW BUT LOOKING AT He BUT BE CAUTIOUS UNTIL RE-DOING THIS AT STRATA LEVEL modeling He with random effects structure NEED TO DOUBLE CHECK IS IS APPROPRIATE I assigned regional clusters here, and took the mean of the Ar across sites within the region…but it might be wrong to do that and might need to calculate that on the regions themselves at a higher level (i.e. by defining strata in genind object)

Would also be more interesting if there was an actual hypothesis here rather than just are the regions different…

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: he ~ long_m_center + lat_m_center + (1 | locus)
##    Data: df_hs
## 
##      AIC      BIC   logLik deviance df.resid 
##   -230.0   -214.3    120.0   -240.0      164 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3771 -0.4835  0.1428  0.4909  2.9513 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 0.04642  0.2155  
##  Residual             0.01034  0.1017  
## Number of obs: 169, groups:  locus, 13
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   3.684e-01  3.274e-01   1.125
## long_m_center 5.924e-08  4.465e-08   1.327
## lat_m_center  6.142e-08  6.547e-08   0.938
## 
## Correlation of Fixed Effects:
##             (Intr) lng_m_
## long_m_cntr -0.762       
## lat_m_centr -0.982  0.757
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## Data: df_hs
## Models:
## mod_null_he: he ~ 1 + (1 | locus)
## mod_he: he ~ long_m_center + lat_m_center + (1 | locus)
##             npar     AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)
## mod_null_he    3 -232.19 -222.8 119.10  -238.19                     
## mod_he         5 -229.95 -214.3 119.98  -239.95 1.7602  2     0.4147
##             R2m       R2c
## [1,] 0.00191564 0.8181946

modeling allelic richness with random effects structure NEED TO DOUBLE CHECK IS IS APPROPRIATE I assigned regional clusters here, and took the mean of the Ar across sites within the region…but it might be wrong to do that and might need to calculate that on the regions themselves at a higher level (i.e. by defining strata in genind object)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: richness ~ long_m_center + (1 | locus)
##    Data: df_ar
## 
##      AIC      BIC   logLik deviance df.resid 
##    391.3    403.8   -191.7    383.3      165 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.9098 -0.5344  0.1968  0.5939  2.1752 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 1.8945   1.3764  
##  Residual             0.4124   0.6422  
## Number of obs: 169, groups:  locus, 13
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   3.755e+00  3.896e-01   9.638
## long_m_center 2.390e-07  1.844e-07   1.296
## 
## Correlation of Fixed Effects:
##             (Intr)
## long_m_cntr -0.154
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## Data: df_ar
## Models:
## mod_null_ar: richness ~ 1 + (1 | locus)
## mod_ar: richness ~ long_m_center + (1 | locus)
##             npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## mod_null_ar    3 390.97 400.36 -192.49   384.97                     
## mod_ar         4 391.30 403.82 -191.65   383.30 1.6705  1     0.1962
##             R2m       R2c
## [1,] 0.00178405 0.8215358
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mean_fis ~ region + (1 | locus)
##    Data: df_fis
## 
##      AIC      BIC   logLik deviance df.resid 
##     50.0     65.5    -20.0     40.0      160 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.70211 -0.62476 -0.07558  0.62004  2.78966 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 0.0000   0.0000  
##  Residual             0.0746   0.2731  
## Number of obs: 165, groups:  locus, 13
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        0.03863    0.07575   0.510
## regionCentral      0.19941    0.07922   2.517
## regionTwin Cities  0.14898    0.10713   1.391
## 
## Correlation of Fixed Effects:
##             (Intr) rgnCnt
## regionCntrl -0.956       
## reginTwnCts -0.707  0.676
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## Data: df_fis
## Models:
## mod_null_fis: mean_fis ~ 1 + (1 | locus)
## mod_fis: mean_fis ~ region + (1 | locus)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## mod_null_fis    3 52.371 61.689 -23.185   46.371                       
## mod_fis         5 49.980 65.510 -19.990   39.980 6.3902  2    0.04096 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             R2m        R2c
## [1,] 0.03821111 0.03821111
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = mean_fis ~ region + (1 | locus), data = df_fis, 
##     REML = FALSE)
## 
## Linear Hypotheses:
##                                Estimate Std. Error z value Pr(>|z|)  
## Central - Appalachian == 0      0.19941    0.07922   2.517   0.0301 *
## Twin Cities - Appalachian == 0  0.14898    0.10713   1.391   0.3358  
## Twin Cities - Central == 0     -0.05044    0.07922  -0.637   0.7936  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: he ~ region + (1 | locus)
##    Data: df_hs_reg
## 
##      AIC      BIC   logLik deviance df.resid 
##    -47.6    -39.3     28.8    -57.6       34 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2034 -0.2685  0.1869  0.3937  1.4637 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 0.043783 0.2092  
##  Residual             0.004198 0.0648  
## Number of obs: 39, groups:  locus, 13
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)      0.683800   0.060753  11.255
## regioncentral    0.004731   0.025415   0.186
## regionnorth west 0.016215   0.025415   0.638
## 
## Correlation of Fixed Effects:
##             (Intr) rgncnt
## regioncntrl -0.209       
## regnnrthwst -0.209  0.500
## Data: df_hs_reg
## Models:
## mod_null_he_reg: he ~ 1 + (1 | locus)
## mod_he_reg: he ~ region + (1 | locus)
##                 npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## mod_null_he_reg    3 -51.174 -46.183 28.587  -57.174                     
## mod_he_reg         5 -47.601 -39.283 28.801  -57.601 0.4271  2     0.8077
##               R2m       R2c
## [1,] 0.0009905828 0.9125857
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = he ~ region + (1 | locus), data = df_hs_reg, REML = FALSE)
## 
## Linear Hypotheses:
##                               Estimate Std. Error z value Pr(>|z|)
## central - appalachian == 0    0.004731   0.025415   0.186    0.981
## north west - appalachian == 0 0.016215   0.025415   0.638    0.799
## north west - central == 0     0.011485   0.025415   0.452    0.894
## (Adjusted p values reported -- single-step method)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: ar ~ region + (1 | locus)
##    Data: df_ar_reg
## 
##      AIC      BIC   logLik deviance df.resid 
##    184.1    192.4    -87.0    174.1       34 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8636 -0.4774 -0.1023  0.7252  1.5097 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 23.685   4.867   
##  Residual              1.346   1.160   
## Number of obs: 39, groups:  locus, 13
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)        7.5385     1.3876   5.433
## regioncentral      0.9615     0.4551   2.113
## regionnorth west   1.0744     0.4551   2.361
## 
## Correlation of Fixed Effects:
##             (Intr) rgncnt
## regioncntrl -0.164       
## regnnrthwst -0.164  0.500
## Data: df_ar_reg
## Models:
## mod_null_ar_reg: ar ~ 1 + (1 | locus)
## mod_ar_reg: ar ~ region + (1 | locus)
##                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## mod_null_ar_reg    3 186.06 191.06 -90.033   180.06                       
## mod_ar_reg         5 184.08 192.40 -87.040   174.08 5.9861  2    0.05013 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##              R2m       R2c
## [1,] 0.009437976 0.9467198
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = ar ~ region + (1 | locus), data = df_ar_reg, REML = FALSE)
## 
## Linear Hypotheses:
##                               Estimate Std. Error z value Pr(>|z|)  
## central - appalachian == 0      0.9615     0.4551   2.113   0.0874 .
## north west - appalachian == 0   1.0744     0.4551   2.361   0.0479 *
## north west - central == 0       0.1129     0.4551   0.248   0.9667  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: fis ~ region + (1 | locus)
##    Data: df_fis_reg
## 
##      AIC      BIC   logLik deviance df.resid 
##    -52.3    -44.0     31.1    -62.3       34 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7970 -0.5892  0.0631  0.4268  2.2589 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locus    (Intercept) 0.003491 0.05909 
##  Residual             0.009202 0.09593 
## Number of obs: 39, groups:  locus, 13
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)       0.03863    0.03125   1.236
## regioncentral     0.19694    0.03763   5.234
## regionnorth west  0.15480    0.03763   4.114
## 
## Correlation of Fixed Effects:
##             (Intr) rgncnt
## regioncntrl -0.602       
## regnnrthwst -0.602  0.500
## Data: df_fis_reg
## Models:
## mod_null_fis_reg: fis ~ 1 + (1 | locus)
## mod_fis_reg: fis ~ region + (1 | locus)
##                  npar     AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)    
## mod_null_fis_reg    3 -36.161 -31.17 21.080  -42.161                         
## mod_fis_reg         5 -52.288 -43.97 31.144  -62.288 20.127  2   4.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##            R2m       R2c
## [1,] 0.3669564 0.5410691
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = fis ~ region + (1 | locus), data = df_fis_reg, 
##     REML = FALSE)
## 
## Linear Hypotheses:
##                               Estimate Std. Error z value Pr(>|z|)    
## central - appalachian == 0     0.19694    0.03763   5.234  < 1e-04 ***
## north west - appalachian == 0  0.15480    0.03763   4.114 0.000127 ***
## north west - central == 0     -0.04214    0.03763  -1.120 0.501742    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Population Structure

STRUCTURE

evanno method for finding “best” K value

typical STRUCTURE barplot

structure based pie chart map

This is more zoomed in than the above map for clarity. The top map is intended to actually show the historic range (states). Perhaps a better way would be to actually zoom in somehow rather than trim states.

Map of relative contribution of different clusters across regions. Pie size scaled by sample size (not linearly!). There is a dot in the middle of each pie, that is just so when I eventually edit this in inkscape (because there is no clear way to “repel” scatterpies) I have the centroid retained

Ultimately, I’m going to put the above barplot into the mapping area in the top-right corner. Then the order of the populations in that barplot will match the left-to-right order of the pies (after repelling them a bit for visual clarity)

DAPC

Both DAPC and STRUCTURE methods are most explanatory at k=3; similar results are obtained for k=4, but choosing 3 for sake of parsimony

##  #################################################
##  # Discriminant Analysis of Principal Components #
##  #################################################
## class: dapc
## $call: dapc.genind(x = genind_rpbb, n.pca = 60, n.da = 20)
## 
## $n.pca: 60 first PCs of PCA used
## $n.da: 12 discriminant functions saved
## $var (proportion of conserved variance): 0.884
## 
## $eig (eigenvalues): 43.34 28.59 20.37 15.31 13.33 ...
## 
##   vector    length content                   
## 1 $eig      12     eigenvalues               
## 2 $grp      231    prior group assignment    
## 3 $prior    13     prior group probabilities 
## 4 $assign   231    posterior group assignment
## 5 $pca.cent 182    centring vector of PCA    
## 6 $pca.norm 182    scaling vector of PCA     
## 7 $pca.eig  168    eigenvalues of PCA        
## 
##   data.frame    nrow ncol content                                          
## 1 $tab          231  60   retained PCs of PCA                              
## 2 $means        13   60   group means                                      
## 3 $loadings     60   12   loadings of variables                            
## 4 $ind.coord    231  12   coordinates of individuals (principal components)
## 5 $grp.coord    13   12   coordinates of groups                            
## 6 $posterior    231  13   posterior membership probabilities               
## 7 $pca.loadings 182  60   PCA loadings of original variables               
## 8 $var.contr    182  12   contribution of original variables

AMOVA

## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
## 
## $results
##                             Df    Sum Sq   Mean Sq
## Between pop                 12  224.7766 18.731387
## Between samples Within pop 218 2356.6080 10.810128
## Within samples             231 1688.2162  7.308295
## Total                      461 4269.6008  9.261607
## 
## $componentsofcovariance
##                                            Sigma          %
## Variations  Between pop                0.2402062   2.583024
## Variations  Between samples Within pop 1.7509166  18.828239
## Variations  Within samples             7.3082952  78.588737
## Total variations                       9.2994180 100.000000
## 
## $statphi
##                          Phi
## Phi-samples-total 0.21411263
## Phi-samples-pop   0.19327472
## Phi-pop-total     0.02583024
## class: krandtest lightkrandtest 
## Monte-Carlo tests
## Call: randtest.amova(xtest = amova_rpbb, nrepet = 999)
## 
## Number of tests:   3 
## 
## Adjustment method for multiple comparisons:   none 
## Permutation number:   999 
##                         Test       Obs   Std.Obs   Alter Pvalue
## 1  Variations within samples 7.3082952 -22.22865    less  0.001
## 2 Variations between samples 1.7509166  16.24033 greater  0.001
## 3     Variations between pop 0.2402062  11.98792 greater  0.001

Population Differentiation

F-Statistics

Pairwise Fst heat map. Not sure what’s up with the SE Minnesota population or Green Bay…but there’s also not that many specimens from them (8 and 7, respectively)

Below is with only populations having >=5 specimens

## 
## Call:
## lm(formula = Fst ~ distance, data = df_fst_join)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.032784 -0.014806 -0.004775  0.007285  0.093444 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.704e-02  4.725e-03   3.607 0.000686 ***
## distance    3.819e-08  9.292e-09   4.110 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0225 on 53 degrees of freedom
## Multiple R-squared:  0.2417, Adjusted R-squared:  0.2274 
## F-statistic: 16.89 on 1 and 53 DF,  p-value: 0.0001381
## 
## Call:
## lm(formula = Fst ~ log(distance), data = df_fst_join)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.032783 -0.014908 -0.005903  0.005882  0.088357 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.178853   0.048422  -3.694 0.000524 ***
## log(distance)  0.016773   0.003846   4.361 5.99e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02217 on 53 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2502 
## F-statistic: 19.02 on 1 and 53 DF,  p-value: 5.994e-05

Inbreeding Evidence

UPDATE!!!!!!!!!!!!!!!!!!! Analysis of Males

Determine numbers of males using the minimum homozygosity method, discuss implications

## # A tibble: 4 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female no        20
## 2 female yes      327
## 3 male   no        52
## 4 male   yes       63
## [1] 0.1615385
## # A tibble: 4 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female no        29
## 2 female yes      318
## 3 male   no        89
## 4 male   yes       26
## [1] 0.0755814
## # A tibble: 4 × 3
##   site        threshold prop_diploid
##   <chr>           <dbl>        <dbl>
## 1 MN Zoo              1        0.175
## 2 MN Zoo              2        0.114
## 3 Appalachian         1        0.523
## 4 Appalachian         2        0.3

MNZoo 2020

## # A tibble: 3 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female yes       18
## 2 male   no         8
## 3 male   yes        1
## [1] 0.05263158
## # A tibble: 3 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female yes       18
## 2 male   no         8
## 3 male   yes        1
## [1] 0.05263158

mnzoo 2021

## # A tibble: 4 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female no         1
## 2 female yes       15
## 3 male   no         3
## 4 male   yes        6
## [1] 0.2857143
## # A tibble: 4 × 3
##   sex    is_het     n
##   <chr>  <chr>  <int>
## 1 female no         3
## 2 female yes       13
## 3 male   no         6
## 4 male   yes        3
## [1] 0.1875

Site-level Analyses (i.e. COLONY stuff)

UPDATE!!!!!!!!!!!!!!!!!!! COLONY abundance and results

Insert summary table, case studies of MnZoo, Turtle Valley, and other sites with >15? 10? whatever

MN Zoo 2020

## # A tibble: 1 × 6
##   n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
##         <int>      <int>         <int>    <dbl>    <dbl>         <dbl>
## 1          18          9            16        9       31         0.562

MN Zoo 2021

## # A tibble: 1 × 6
##   n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
##         <int>      <int>         <int>    <dbl>    <dbl>         <dbl>
## 1          16         13            39     19.5     1000         0.333

Turtle Valley 2021

## # A tibble: 1 × 6
##   n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
##         <int>      <int>         <int>    <dbl>    <dbl>         <dbl>
## 1          42         27            66     49.5     121.         0.409